We present hierarchical policy blending as optimal transport (HiPBOT). This hierarchical framework adapts the weights of low-level reactive expert policies, adding a look-ahead planning layer on the parameter space of a product of expert policies and agents. Our high-level planner realizes a policy blending via unbalanced optimal transport, consolidating the scaling of underlying Riemannian motion policies, effectively adjusting their Riemannian matrix, and deciding over the priorities between experts and agents, guaranteeing safety and task success. Our experimental results in a range of application scenarios from low-dimensional navigation to high-dimensional whole-body control showcase the efficacy and efficiency of HiPBOT, which outperforms state-of-the-art baselines that either perform probabilistic inference or define a tree structure of experts, paving the way for new applications of optimal transport to robot control. More material at https://sites.google.com/view/hipobot
translated by 谷歌翻译
安全是每个机器人平台的关键特性:任何控制政策始终遵守执行器限制,并避免与环境和人类发生冲突。在加强学习中,安全对于探索环境而不会造成任何损害更为基础。尽管有许多针对安全勘探问题的建议解决方案,但只有少数可以处理现实世界的复杂性。本文介绍了一种安全探索的新公式,用于强化各种机器人任务。我们的方法适用于广泛的机器人平台,即使在通过探索约束歧管的切线空间从数据中学到的复杂碰撞约束下也可以执行安全。我们提出的方法在模拟的高维和动态任务中实现了最先进的表现,同时避免与环境发生冲突。我们在Tiago ++机器人上展示了安全的现实部署,在操纵和人类机器人交互任务中取得了显着的性能。
translated by 谷歌翻译
多目标高维运动优化问题在机器人技术中无处不在,并且信息丰富的梯度受益。为此,我们要求所有成本函数都可以微分。我们建议学习任务空间,数据驱动的成本功能作为扩散模型。扩散模型代表表达性的多模式分布,并在整个空间中表现出适当的梯度。我们通过将学习的成本功能与单个目标功能中的其他潜在学到的或手工调整的成本相结合,并通过梯度下降共同优化所有这些属性来优化运动。我们在一组复杂的掌握和运动计划问题中展示了联合优化的好处,并与将掌握的掌握选择与运动优化相提并论相比。
translated by 谷歌翻译
在本文中,我们关注将基于能量的模型(EBM)作为运动优化的指导先验的问题。 EBM是一组神经网络,可以用合适的能量函数参数为参数的GIBBS分布来表示表达概率密度分布。由于其隐含性,它们可以轻松地作为优化因素或运动优化问题中的初始采样分布整合在一起,从而使它们成为良好的候选者,以将数据驱动的先验集成在运动优化问题中。在这项工作中,我们提出了一组所需的建模和算法选择,以使EBMS适应运动优化。我们调查了将其他正规化器在学习EBM中的好处,以将它们与基于梯度的优化器一起使用,并提供一组EBM架构,以学习用于操纵任务的可通用分布。我们提出了多种情况,可以将EBM集成以进行运动优化,并评估学到的EBM的性能,以指导模拟和真实机器人实验的指导先验。
translated by 谷歌翻译
自主机器人应在现实世界中的动态环境中运行,并与人类在紧密的空间中合作。允许机器人离开结构化实验室和制造设置的关键组成部分是他们与周围世界的在线和实时碰撞评估的能力。基于距离的约束是使机器人计划行动并安全采取行动,保护人类及其硬件的基础。但是,不同的应用需要不同的距离分辨率,从而导致各种启发式方法测量距离场W.R.T.障碍物在计算上很昂贵,并阻碍了他们在动态障碍避免用例中的应用。我们提出了正则签名的距离距离(REDSDF),这是一个单个神经隐式函数,可以在任何规模上计算平滑距离场,并在高维歧管上具有细粒度的分辨率和像人类这样的明确物体,这要归功于我们的有效数据生成和A训练过程中简单的感应偏置。我们证明了我们的方法在共享工作区中的全身控制(WBC)和安全的人类机器人相互作用(HRI)中的代表性模拟任务中的有效性。最后,我们在使用移动操纵器机器人的HRI移交任务中提供了现实世界应用的概念证明。
translated by 谷歌翻译
机器人大会发现是一个充满挑战的问题,它生活在资源分配和运动计划的交集中。目的是将一组预定义的对象组合在一起,以形成新事物,同时考虑使用机器人在循环中执行任务。在这项工作中,我们解决了使用一组类似俄罗斯方块的构建块和机器人操纵器完全从头开始构建任意,预定义的目标结构的问题。我们的新型分层方法旨在有效地将整个任务分解为三个可行的水平,这些级别相互受益。在高水平上,我们运行了一个经典的混合企业计划,用于全局优化块类型的选择和块的最终姿势,以重新创建所需的形状。然后利用其输出来有效地指导探索基础强化学习(RL)政策。该RL策略从基于Q的灵活图表中汲取了其概括属性,该属性通过Q-学习学习,可以通过搜索来完善。此外,它说明了结构稳定性和机器人可行性的必要条件,这些条件无法有效地反映在上一层中。最后,掌握和运动计划者将所需的组装命令转换为机器人关节运动。我们证明了我们提出的方法在一组竞争性的模拟RAD环境中的性能,展示现实世界的转移,并与非结构化的端到端方法相比,报告性能和稳健性。视频可从https://sites.google.com/view/rl-meets-milp获得。
translated by 谷歌翻译
移动操作(MM)系统是在非结构化现实世界环境中扮演个人助理角色的理想候选者。除其他挑战外,MM需要有效协调机器人的实施例,以执行需要移动性和操纵的任务。强化学习(RL)的承诺是将机器人具有自适应行为,但是大多数方法都需要大量的数据来学习有用的控制策略。在这项工作中,我们研究了机器人可及先验在参与者批判性RL方法中的整合,以加速学习和获取任务的MM学习。也就是说,我们考虑了最佳基础位置的问题以及是否激活ARM达到6D目标的后续决定。为此,我们设计了一种新型的混合RL方法,该方法可以共同处理离散和连续的动作,从而诉诸Gumbel-Softmax重新聚集化。接下来,我们使用来自经典方法的操作机器人工作区中的数据训练可及性。随后,我们得出了增强的混合RL(BHYRL),这是一种通过将其建模为残留近似器的总和来学习Q功能的新型算法。每当需要学习新任务时,我们都可以转移我们学到的残差并了解特定于任务的Q功能的组成部分,从而从先前的行为中维护任务结构。此外,我们发现将目标政策与先前的策略正规化产生更多的表达行为。我们评估了我们在达到难度增加和提取任务的模拟方面的方法,并显示了Bhyrl在基线方法上的卓越性能。最后,我们用Bhyrl零转移了我们学到的6D提取政策,以归功于我们的MM机器人Tiago ++。有关更多详细信息和代码发布,请参阅我们的项目网站:irosalab.com/rlmmbp
translated by 谷歌翻译
Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. At the same time, tree-based ensemble algorithms for regression are adopted in various fields for solving algorithmic problems with high accuracy and low computational cost. The latter can constitute a crucial factor for selecting algorithms for satellite precipitation product correction at the daily and finer time scales, where the size of the datasets is particularly large. Still, information on which tree-based ensemble algorithm to select in such a case for the contiguous United States (US) is missing from the literature. In this work, we conduct an extensive comparison between three tree-based ensemble algorithms, specifically random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost), in the context of interest. We use daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments refer to the entire contiguous US and additionally include the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. They also suggest that IMERG is more useful than PERSIANN in the context investigated.
translated by 谷歌翻译
Many, if not most, systems of interest in science are naturally described as nonlinear dynamical systems (DS). Empirically, we commonly access these systems through time series measurements, where often we have time series from different types of data modalities simultaneously. For instance, we may have event counts in addition to some continuous signal. While by now there are many powerful machine learning (ML) tools for integrating different data modalities into predictive models, this has rarely been approached so far from the perspective of uncovering the underlying, data-generating DS (aka DS reconstruction). Recently, sparse teacher forcing (TF) has been suggested as an efficient control-theoretic method for dealing with exploding loss gradients when training ML models on chaotic DS. Here we incorporate this idea into a novel recurrent neural network (RNN) training framework for DS reconstruction based on multimodal variational autoencoders (MVAE). The forcing signal for the RNN is generated by the MVAE which integrates different types of simultaneously given time series data into a joint latent code optimal for DS reconstruction. We show that this training method achieves significantly better reconstructions on multimodal datasets generated from chaotic DS benchmarks than various alternative methods.
translated by 谷歌翻译
Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents, without any keypoint or bounding box supervision in 2D or 3D. Our method uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct spatiotemporal differences across multiple views, in addition to joint length constraints on a learned 3D skeleton of the subject. In this way, we discover keypoints without requiring manual supervision in videos of humans and rats, demonstrating the potential of 3D keypoint discovery for studying behavior.
translated by 谷歌翻译